Introduction to Data Science - Method and Tools

Stock's Prediction

new%20york%20stock.jpg

The research question we will focus on is:

can we predict future stock trend?

During the project we came across additional questions that were answered like:

  1. How much The Stock value changed in 10 years?
  2. What was the correlation between different stocks?

Importing Libraries List -

Method 1 -

We pulled out the information for 4 different stocks from "Yahoo Finance" using selenium, Then created one DataFrame that

combines them all, in the main Jupyter notebook you can use Selenium method.

Reading the data from CSV we already created using Method 1.

Basic Info - Stock_data

  1. Open - The Stock starting price at the same day.
  2. High - The highest value that the Stock reached at the same day.
  3. Low - As high but the lowest.
  4. Close - The Stock Price when the working day over.
  5. Adj Close - After the stock exchange closes there can still be changes in the share price, the parameter perfects all possible changes and determines the value of the share at the end of the working day.
  6. Volume - This parameter indicates how much the stock is tradable - the more the better.
  7. Company - Company's name.

Now let's Explore and take care of our data

Correlation for each stock separately, comaparing each column with each other.

Outliers - Using IQR Method.

Stocks can behave in an extreme way, both in terms of rise or fall and therefore we will not address the same "Outliers" that we have identified.

Dealing with them can hurt our level of prediction. any important information.

EDA - for stock_data

Describing each of Company Close data separately.

histograms of "Close" Column for each stock.

Here you can see the stock price most of the time, regardless of time but the amount in which the price range repeats (relative frequency).

All Stocks "Close" through the years...

From the "basic information" we mentioned earlier,

it can be clearly seen that there is a relationship between all the "stock_data" columns except one "volume".

So we used Scatterplot to find the type of relationship, and we found out that there is none.

Total volume of each stock bieng traded each day

We can see here that during the year all stock volume went down except Tesla that during the Covid - 19 traded more and the stock volume grew bigger.

Checking the Correlation between the stock on "Close" Column.

Here is also the df_new details, Showing "Close" column from each stock.

We found that there is 0.79 correlation between Tesla and Netflix, So we decided to check the connection and plot it on Scatterplot.

and see what we get.

We did not reach a concrete conclusion about the connection between them because we did not find a "graphic connection".

Now, after we finished taking care and explore our data.

It's time to create our Model, starting with Machine Learning.

for doing that we will choose one stock from our already exsiting group.

Machine learning:

We will take only 'Close' Column for the prediction model, no need for the rest because they are the same.

Normalizing the values using MinMaxScaler Function.

We splitted the Data into Train and Test(90% for train and 10% test)

We will use "LSTM" model for building, compilation and training our model.

Since we chose an issue that did not match the models presented to us in the lessons,

For example, we can not run "Linear Regression", this can be seen clearly in the graph "Close Through the Years for all Stocks".

We investigated and found that a model of the "LSTM" type solves the problem of data collection over time.

%D7%94%D7%95%D7%A8%D7%93%D7%94.png

Showing the Actual and Predictions on DataFrame.

We researched online which model to use to predict a stock price

https://www.analyticsvidhya.com/blog/2021/10/machine-learning-for-stock-market-prediction-with-step-by-step-implementation/

https://neptune.ai/blog/predicting-stock-prices-using-machine-learning